# English speech recognition
Aero 1 Audio
MIT
Lightweight audio model, excelling in speech recognition, audio understanding, and executing audio instructions among other diverse tasks
Audio-to-Text
Transformers English

A
lmms-lab
1,348
74
Deepfake Audio Detection
Apache-2.0
A deepfake audio detection model fine-tuned based on facebook/wav2vec2-base, achieving 95.45% accuracy on the evaluation set
Audio Classification
Transformers

D
Heem2
246
0
Parakeet Tdt Ctc 1.1b
Parakeet TDT-CTC 1.1B is an automatic speech recognition model capable of transcribing English speech with punctuation and capitalization, jointly developed by NVIDIA NeMo and Suno.ai.
Speech Recognition English
P
nvidia
35.19k
18
Faster Whisper Medium.en
MIT
This is the CTranslate2 converted version of the OpenAI Whisper medium.en model, used for efficient automatic speech recognition tasks.
Speech Recognition English
F
Systran
65.17k
3
Wavlm Bart
A sequence-to-sequence model supporting English automatic speech recognition (ASR), capable of outputting normalized text, timestamp annotations, and multi-speaker segmentation.
Speech Recognition
Transformers English

W
nguyenvulebinh
24
2
Exp W2v2t En Vp Nl S281
Apache-2.0
An English speech recognition model fine-tuned based on facebook/wav2vec2-large-nl-voxpopuli, trained using the Common Voice 7.0 training set.
Speech Recognition
Transformers English

E
jonatasgrosman
18
0
Exp W2v2t En No Pretraining S289
Apache-2.0
This is a model designed for English speech recognition tasks, based on a randomly initialized wav2vec2 architecture and fine-tuned using the Common Voice 7.0 dataset.
Speech Recognition
Transformers English

E
jonatasgrosman
18
0
Wav2vec2 2 Bart Large No Adapter
This model is an automatic speech recognition (ASR) model trained on the LibriSpeech ASR dataset, capable of converting English speech into text.
Speech Recognition
Transformers

W
sanchit-gandhi
22
0
Wav2vec2 2 Rnd
An automatic speech recognition model trained on the LibriSpeech ASR dataset, designed to convert English speech into text.
Speech Recognition
Transformers

W
sanchit-gandhi
16
0
Wav2vec2 Large Xlsr 53 English
Apache-2.0
An English speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained on the Common Voice 6.1 dataset
Speech Recognition English
W
jonatasgrosman
251.78k
471
Wav2vec2 Base 10k Voxpopuli Ft En
A Wav2Vec2 base model pre-trained on a 10K unlabeled subset of the VoxPopuli corpus and fine-tuned on English transcription data, suitable for English speech recognition tasks.
Speech Recognition
Transformers English

W
facebook
40
1
Unispeech Sat Base Timit Ft
This model is an automatic speech recognition model fine-tuned on the TIMIT_ASR dataset based on microsoft/unispeech-sat-base, achieving a word error rate of 41.01% on the evaluation set.
Speech Recognition
Transformers

U
patrickvonplaten
15
0
Asr Wav2vec2 Commonvoice En
Apache-2.0
This is an end-to-end automatic speech recognition system trained on the CommonVoice English dataset, combining the wav2vec 2.0 pre-trained model and CTC decoder.
Speech Recognition English
A
speechbrain
681
12
W2v Timit Ft 4001
A speech recognition model based on Wav2Vec 2.0 architecture, fine-tuned on the TIMIT dataset, suitable for English speech-to-text tasks
Speech Recognition
Transformers

W
devin132
22
0
Xlsr En Punctuation
Apache-2.0
Fine-tuned automatic speech recognition model based on facebook/wav2vec2-large-xlsr-53 on the English Common Voice dataset, supporting punctuation prediction
Speech Recognition English
X
boris
30.28k
3
Wav2vec2 Base Repro Timit
This model is an automatic speech recognition model fine-tuned on the TIMIT_ASR - NA dataset, based on patrickvonplaten/wav2vec2-base-repro-960h-libri-85k-steps.
Speech Recognition
Transformers

W
patrickvonplaten
20
0
Unispeech Sat Base Plus Timit Ft
An automatic speech recognition (ASR) model fine-tuned on the TIMIT_ASR dataset based on microsoft/unispeech-sat-base-plus
Speech Recognition
Transformers

U
patrickvonplaten
16
0
Wav2vec2 2 Bert Large No Adapter
An automatic speech recognition (ASR) model trained on the LibriSpeech dataset for converting English speech to text
Speech Recognition
Transformers

W
speech-seq2seq
15
1
Wav2vec2 Random
An automatic speech recognition model fine-tuned on the TIMIT_ASR dataset based on the wav2vec2-base-random model
Speech Recognition
Transformers

W
patrickvonplaten
16
0
Wav2vec2 Large English
Apache-2.0
An automatic speech recognition model fine-tuned on English based on facebook/wav2vec2-large, trained using the Common Voice 6.1 dataset
Speech Recognition
Transformers English

W
jonatasgrosman
355
4
Wav2vec2 Xls R 1b English
Apache-2.0
This is an English speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple English speech datasets.
Speech Recognition
Transformers English

W
jonatasgrosman
1,896
9
Wav2vec2 2 Bert Large No Adapter Frozen Enc
This model is a speech recognition model trained on the librispeech_asr dataset, achieving a word error rate (WER) of 2.0133 on the evaluation set.
Speech Recognition
Transformers

W
speech-seq2seq
25
2
Wav2vec2 Large Lv60 Timit Asr
Apache-2.0
A speech recognition model fine-tuned on the timit_asr dataset based on facebook/wav2vec2-large-lv60
Speech Recognition English
W
elgeish
13
0
Featured Recommended AI Models